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Abstract 

Given an undirected graph G with m edges, n vertices, and non-negative edge 
weights, and given an integer k > 2, we show that a {2k — l)-approximate distance 
oracle for G of size 

and with O(logfc) query time can be constructed in 
0{mm{kmn^/'' , \^m + kn^'^'^/^}) time for some constant c. This improves the 0{k) 
query time of Thorup and Zwick. For any < e < 1, we also give an oracle of 
size 0(fcn^+^/'^) that answers ((2 -I- e)fc)-approximate distance queries in 0(l/e) time. 
At the cost of a fc-factor in size, this improves the 128fc approximation achieved by 
the constant query time oracle of Mendel and Naor and approaches the best possible 
tradeoff between size and stretch, implied by a widely believed girth conjecture of 
Erdos. We can match the 0(n^+^/*'') size bound of Mendel and Naor for any constant 
e > and k ~ 0(logn/ loglogn). 
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1 Introduction 



The practical need for efficient algorithms to answer shortest path (distance) queries in 
graphs has increased significantly over the years, in large part due to emerging GPS navi- 
gation technology and other route planning software. Classical algorithms like Dijkstra do 
not scale well as they may need to explore the entire graph just to answer a single query. 
As road maps are typically of considerable size, developing more efficient algorithms and 
data structures has received a great deal of attention from the research community. 

A distance oracle is a data structure that answers distance queries in time independent 
of the size of the graph. A naive way of achieving this is to precompute and store all-pairs 
shortest path distances in a look-up table, allowing subsequent queries to be answered in 
constant time. The obvious drawback is of course the huge space requirement which is 
quadratic in the number of vertices of the graph, as well as the long time for precomputing 
all-pairs shortest path distances. 

Thorup and Zwick jTT] considered approximate distance oracles. They showed that 
by allowing some small error in the distances reported, both space and preprocessing can 
be dramatically improved while still ensuring constant query time. More precisely, for an 
undirected weighted graph with m edges and n vertices, a data structure of size 0{kn^^^^'') 
can be constructed in 0{k'm'n}/^) time which reports shortest path distances stretched by a 
factor of at most 2k — 1 in 0{k) time. The tradeoff between size and stretch is optimal (up 
to a factor of k in space), assuming a widely believed and partially proved girth conjecture 
of Erdos [1] . 

Time and space in [TT] are expected bounds; Roditty Thorup, and Zwick [9] gave a 
deterministic oracle with only a small increase in preprocessing. 

Baswana and Kavitha j2] showed how to obtain O(n^) preprocessing for > 3, an 
improvement for dense graphs. Subquadratic time was recently obtained for k > 6 and 
m = o(n^) |13j . Patra§cu and Roditty [8] gave an oracle of size 0(n^/a^/^) and stretch 
2 for a graph with m = r? ja edges. Furthermore, they showed that a size 0(n^/^) 
oracle with multiplicative stretch 2 and additive stretch 1 exists for unweighted graphs. 
Baswana, Gaur, Sen, and Upadhyay [1] also gave oracles with both multiplicative and 
additive stretch. 

Although the oracles above for general k answer queries in time independent of the 
graph size, query time still depends on stretch. Mendel and Naor [S] asked the question of 
whether good approximate distance oracles exist with query time bounded by a universal 
constant. They answered this in the affirmative by giving an oracle of size 0{ri}^^l^^^ 
stretch at most 128A;, query time 0(1) and preprocessing time 0(n^'''"^/'^ log ?i). According 
to Naor and Tao [7] , with a more careful analysis of the arguments in [5] , one can improve 
stretch to roughly 16fc but not by much more. The Oin^^^^l^ logn) preprocessing time was 
later improved by Mendel and Schwob to 0(mn^/'^ log^ n); for an n-point metric space, 
they obtain a bound of 0(n^)Q 

We refer the reader to the survey by Sen [10] on distance oracles and on the related 
area of spanners. 

Our contributions: Our first contribution is an improvement of the query time of the 
Thorup-Zwick oracle from 0(/c) to 0(log/c) without increasing space, stretch, or pre- 
processing time. We achieve this by showing how to apply binary search on the bunch- 
structures, introduced by Thorup and Zwick. Our improved query algorithm is very simple 

thank an anonymous referee of an earlier version of the paper for mentioning this improvement. 
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Table 1: Performance of distance oracles in weighted undirected graphs. 
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to describe and straightforward to implement. It can easily be incorporated into our recent 
distance oracle [iSj, giving improved preprocessing. 

Our second contribution is an approximate distance oracle with universally constant 
query time whose size is 0{k'n}~^^/^) and whose stretch can be made arbitrarily close to 
the optimal 2k — 1 (when k = w(l)): for any positive e < 1, we give an oracle of size 
0{kv}+yk), stretch 0((2 + e)/c), and query time 0(l/e). For k = O (log n/ log log n) and 
constant e, space can be improved to 0{n}^^/^), matching that of Mendel and NaoiH. 
To achieve this result, the main idea is to first query the Mendel-Naor oracle to get an 
0(/c)-approximate distance and then refine this estimate in 0(l/e) iterations using the 
bunch-structures of Thorup and Zwick. Our results are summarized in Table [TJ 

Note that we are interested in non-constant k only; if k = 0(1), the Thorup-Zwick 
oracle is optimal (assuming the girth conjecture) since it has size 0{n^+^/^), stretch 2k- I, 
and query time 0(1). 

Organization of the paper: In Section [21 we introduce notation and give some basic 
definitions and results. Our oracle with 0(logA;) query time is presented in Section [3l 
This is followed by our constant time oracle in Section [H first we present a generic algo- 
rithm in Section 14.11 that takes as input a large-stretch distance estimate and outputs a 
refined estimate. Some technical results are presented in Section 14.21 that will allow us to 
combine this generic algorithm with the Mendel-Naor oracle to form our own oracle. We 
describe preprocessing and query in detail in Sections 14.31 and 14.41 and we bound time and 
space requirements in Section 14.51 In Section 14.61 we show how to improve preprocessing 
compared to that in [6] . Finally, we conclude in Section [5l 

2 Preliminaries 

Throughout the paper, G = {V, E) is an undirected connected graph with non-negative 
edge weights and with m edges and n vertices. For u,v ^V, we denote by dG{u,v) the 
shortest path distance between u and v. 

Sometimes we consider list representations of sets. We denote by S[i] the ith. entry of 
some chosen list representation of a set S, i>0. For x > 0, logx is the base 2 logarithm 
of X. 

^This covers almost all values of k that are of interest as the Mendel-Naor oracle has 0(n) space 
requirement for k = fi(logn). 
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Algorithm distfc(ii, v, i) 

1. w ^ Pi{u); j ^ i 

2. while w ^ 

3. j^j + l 

4. (u, v) ^ {v, u) 

5. w Pj{u) 

6. return da{w,u) +da{w,v) 

Figure 1: Answering a distance query, starting at sample level i. 

The following definitions are taken from [llj and we shall use them throughout the 
paper. Let A; > 1 be an integer and form sets Aq, . . . , with V = Aq ^ A\ ^ A2 . ■ ■ ^ 
Ak = 0. For i = 1, . . . , /c— 1, set Ai is formed by picking each element of Ai^i independently 
with probability n"^/*^. Set Ai has expected size 0(n^~*/'^) for z = 0, . . . , /c — 1. For each 
vertex u and each z = l,...,A; — 1, pi{u) denotes the vertex of Ai closest to u (breaking 
ties arbitrarily). We define a hunch Bu as 

k-l 

Bu= \J{v ^ Ai\ Ai+i\dG{u,v) < dG{u,pi+i{u))}, 

i=0 

where we let dG{u,pk{u)) = 00. Thorup and Zwick showed how to compute all bunches in 
0{kmv}^^) time and showed that each of them has expected size 0{kn^/^) for a total of 
0{kTi}^^/^). The following lemma states some simple but important results about bunches. 

Lemma 1. Let u,v £ V be distinct vertices and let < i < k — 1. If Pi{v) ^ Bu then 
dG{u,pi^i{u)) < dG{u,pi{v)). Furthermore, A^-i C Bu- In particular, pk-i{v) G Bu- 

Algorithm d\stk{u,v,i) in Figure [His identical to the query algorithm of Thorup and 
Zwick except that we do not initialize « but allow any start value. We shall use this 
generalized algorithm in our analysis in the following. 

3 Oracle with 0(log k) Query Time 

In this section, we show how to improve the 0{k) query time of the Thorup-Zwick oracle 
to 0(log k). Let X be the index sequence 0, . . . , /c — 1. The idea is to identify r = 0(log k) 
subsequences (Xi = X) D X2 D ... D X^. of X in that order, where for j = 2, . . . , r, 
|Xj I < \ |Xj_i I . Each subsequence Xj has the property that dist^ applied to the beginning of 
it outputs a desired (2A: — l)-approximate distance in 0(|Xj|) time. We apply binary search 
to identify the subsequences, with each step taking constant time. The final subsequence 
Zr- has constant length and distfc is applied to it to compute a {2k — l)-distance estimate 
in constant additional time. 

We define a class of such subsequences in the following. For each vertex u and < i < 
k — 2, define 6i{u) = dG{u,Pi+2{u)) — dG{u,Pi{u)). For vertices u and an index j G X is 
(u, v)-terminal if 

1. j = k — 1 (in which case Pj{u) G By) or 

2. J < A; — 1 is even and either Pj{u) G B^ or pjj^i{v) G Bu- 
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Note that if an index j is (n, 7;)-terminal, disik{u,v,i) terminates if it reaches j or j + 1. 
We say that a subsequence X' = ii, . . . , Z2 of X is {u, v) -feasible if 

1. ii is even, 

2. dG{u,Pi^{u)) < iidGiu,v), and 

3. ^2 is {u, t;)-terminal. 

The fohowing lemma imphes that distfc answers a {2k — l)-approximate distance query for 
u and V when apphed to a (n, f )-feasible sequence. 

Lemma 2. Letii,...,i2 be a {u,v)- feasible sequence. Then distk{u,v,ii) gives a {2k — 1)- 
approximate uv-distance in 0{i2 — ii) time. 

Proof. The time bound foUows from the assumption that i2 is {u, i))-terminal. The stretch 
bound fohows from the analysis of Thorup and Zwick for their query algorithm distfc(ti, v) 
so we omit the details here. They show that each iteration increases dQ{w,u) by at 
most dG{u,v). Since dG{u,Pi^{u)) < iidG{u,v), the triangle inequality and an inductive 
argument gives the desired result. □ 

Lemma 3. X is {u,v) -feasible for any vertices u and v. 

The following lemma allows us to apply binary search to find a {2k — l)-approximate 
distance estimate of dG{u,v). 

Lemma 4. Let ii, . . . ,i2 be a {u, v)-feasible sequence and let i be even, ii + 2<i<i2 — 2. 
Let j be an even index in subsequence ii, . . . ,i — 2 that maximizes 5j{u). If pj{u) ^ 
and pj+i{v) ^ Bu then i, . . . ,i2 is {u, v) -feasible. Otherwise, ii, . . . ,j is {u, v) -feasible. 

Proof, li pj{u) G By or pj^i{v) G B^ then j is (n, z;)-terminal. Since ii,...,i2 is {u,v)- 
feasible, so is zi, . . . , j. 

Now assume that Pj{u) ^ By and pj+i{v) ^ By. Then dG{v,Pj+i{v)) < dG{v,Pj{u)) 
and dG{u,pj+2{u)) < dG{u,pj+i{v)) by Lemma [TJ Applying the triangle inequality twice 
yields 

dG{u,Pj+2{u)) < dG{u,pj+i{v)) < dG{u,v) dG{v,pj+i{v)) 

< dG{u,v) + dG{v,pj{u)) < 2dG{u,v) + dG{u,pj{u)) 

so Sj{u) = dG{u,pj+2{u)) - dG{u,pj{u)) < 2dG{u,v). 

Let X' be the set of even indices ii, ii + 2, zi + 4, . . . , i — 2. Since ii, . . . ,i2 is {u, v)- 
feasible, dG{u,pi^{u)) < iidG{u,v). By the choice of j, 

dG{u,pi{u)) = dG{u,pi^{u)) + 6j'{u) < iidG{u,v) + |X'| max 

= iidG{u,v) H —6j{u) < iidG{u,v) + {i - ii)dG{u,v) = idG{u,v). 

Hence, since ii, . . . , ^2 is (n, 'y)-feasible, so is z, . . . , ^2. □ 

We can now show our first main result. 

Theorem 1. For an integer k > 2, a {2k — 1)- approximate distance oracle of G of 
size 0{kn^+^/^) and 0{logk) query time can be constructed in 0(min{/cmn^/'^, y/km + 
^j^i+c/v^j-j Jqj- some constant c. 
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Proof. We obtain bunch for each vertex n in a total of 0{kmv}/^) time using the 
Thorup-Zwick construction. The following additional preprocessing is done for u to de- 
termine the (u, t;)-subsequences of X that are needed. Let Z' = ii, . . . ,Z2 be the current 
sequence considered; initially, X' = X. Pick an even index i, ii + 2 < i < 12 — 2, such that 
ii, . . . ,i and i, . . . ,i2 have (roughly) the same size and find an even index j in ii, . . . , i — 2 
which maximizes 5j{u). Then recurse on subsequences ii, . . . ,j and i, . . . , i2- The recur- 
sion stops when a sequence of length at most log/c is reached. We show below how to 
identify these indices j in 0(fc) time which is 0{kn) over all u. 

Now, to answer a distance query for vertices u and v, we do binary search on sequences 
X' = ii, . . . ,i2 generated. We start the search with X' = X and check if both Pj{u) ^ By 
and pj-^-l{v) ^ B^- If so, we continue the search on subsequence i, ... ,12- Otherwise, we 
continue the search on ii , . . . , j . We stop when reaching a sequence of length at most log k. 
By Lemmas [3] and [U this subsequence is {u, u)-feasible. Applying dist^ to it outputs a 
{2k — l)-approximate distance estimate of dQ{u,v) by Lemma[2j 

Binary search takes 0(logA;) time. Since we end up with a (u, v)-feasible sequence of 
length at most log k, dist^ applied to it takes 0(log k) time. Hence, query time is 0(log k). 

The oracle in [l3] with 0{Vkm + kn^+''/^) preprocessing time also constructs bunches 
and applies linear search in these to answer distance queries in 0{k) time. Our binary 
search algorithm can immediately be plugged in instead. 

What remains is to show how the preprocessing above for u can be done in 0{k) time. 
Let us call a subsequence of X canonical if it is obtained during the following procedure: 
start with the subsequence X' of X consisting of the even indices. Then find an index 
i £ X' that partitions X' into two (roughly) equal-size subsequences (both containing i), 
and recurse on each of them; the recursion stops when a subsequence consisting of two 
indices is obtained. We keep a binary tree T reflecting the recursion, where each node of T 
is associated with the canonical subsequence generated at that step in the recursion. From 
this procedure, we identify (the endpoints of) all canonical subsequences in 0{k) time. A 
bottom- up 0{k) time algorithm in T can then identify, for each canonical subsequence 
I' = ii, zi + 2 . . . , Z2, an index j = j{X') in ii, zi + 2, . . . , 12 — 2 that maximizes 5j{u). 

Now consider a (not necessarily canonical) subsequence X' = ii, ii + 2, . . . , ^2 of X with 
indices ii < 12 even. We can find O(logfc) canonical subsequences whose union is X' as 
follows: let ii and £2 be the leaves of T associated with canonical subsequences + 2 
and 12 — 2,12, respectively. Let P be the path in T from the parent of ii to the parent of 
£2 and let X be the set of nodes in TXP having a parent in P. Then it is easy to see that 
the O(logfc) canonical subsequences associated with nodes in X have X' as their union. 
It follows that finding the desired index j for X' takes O (log/c) time as it can be found 
among the j-indices for canonical subsequences associated with nodes in X. 

In our preprocessing for vertex u described in the beginning of the proof, we only need 
to find J -indices for 0(A;/ log k) subsequences since the recursion stops when a subsequence 
of length at most log/c is found. Total preprocessing for u is thus 0{k), which is 0{kn) 
over all u. This completes the proof. □ 

4 Oracle with Constant Query Time 

Let < e < ^ be given. In this section, we show how to achieve stretch 2(1 + e)/c — 1, 
query time 0(l/log(l + e)) = 0(l/ejl, and space 0{kn^+^/^). Initially, we aim for a 

^Let X = 1/e > 1. Since In is concave, ln(l + e) — \n[x + 1) — Ina; > ^ ln(a:: + 1) — l/(a:: + 1) > ^e, 
which imphes l/Iog(l + e) = 0{l/e). 
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Algorithm refine_distQ,^e(M, v, duv) 

I. i dy^y 

^ even„(d,j) 

3. if not refine_further(n, z„) then return du 

4. i^O 

5. while refine_further(n, Zm) and i < [log(2Q!)/log(l + e)] 

6. du ^ du/{l + e) 

7. iu ^ evenuidu) 

8. i + 1 

9. ^ even„(du(l + e)) 

10. if i^, > 2 then 

II. let j be an even index in 0, . . . , i!^ — 2 that maximizes Sj 

12. ii pj{u) G By then return dG{u,pj{u)) + dG{v,Pj{u)) 

13. if G then return + (iG(^^,?'j+i('y)) 

14. if e Bt, then return dGiu,pi'^{u)) + dG{v,Pi'^{u)) 

15. else return dG{u,pi'^+i{v)) + dG{v,Pi'^+i{v)) 

Algorithm refine_further(ii, v, i^) 

1. if i„ > 2 then 

2. let j be an even index in 0, . . . , — 2 that maximizes 5j 

3. if Pj{u) G i?^ or pj^i{v) £ then return true 

4. if pj^(n) € -Bi, or pi^^i{v) € -B^ then return true 

5. else return false 



Figure 2: Algorithm refine_dist takes as input an a/c-approximate nu-distance duv and 
outputs a (2(1 + e)k — l)-approximate nv-distance. 

preprocessing bound of 0(n^+^/'^ log n), matching that in [5]. In Section [4.61 we improve 
this to the bound stated in Tabled) 

We start with a generic algorithm, refine_dist, to refine a distance estimate. Later 
we will show how to combine this with the Mendel-Naor oracle. We shall assume that 
l/log(l + e) = o(logA;) since otherwise, the oracle of the previous section can be applied. 

4.1 A generic algorithm 

For a vertex u and a non- negative value du, we define evenu{du) as the largest even index 
iu such that dG{u,Pi^{u)) < du- Pseudocode of refine_dist can be found in Figure O The 
following lemma shows that its output has the stretch we are aiming for. 

Lemma 5. For k > A, a > 1, and e > 0, algorithm Tefine-dista,e{u,v,duv) outputs a 
(2(1 + e)k — 1)- approximate uv-distance if duv is an ak- approximate uv-distance. 

Proof. Initially, dGiu, v) < duv = du- If the test in line 3 succeeds then the same analysis as 
in the proof of Lemma [H shows that dG{u,v) < du < dG{u,pi^+2{u)) < {iu + 2)dG{u,v) < 
(k — l)dG{u, v) (note that iu + 2 < k — 1 since pi^^i{v) ^ Bu which implies + 1 < fc — 1 
by Lemma [1]) so assume that it fails. 
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We know that refine_further(u, f , i^) returns true since i'^ is the value of iu in the 
iteration before the last. Hence, if a distance is returned in line 15, E B^- In 

particular, all distances returned are at least dciu^v). 

Assume first that the while-loop ended because refine_further(n, u, i„) returned false. 
Observing the following string of inequalities in lines 10 to 15 will help us in the following: 

dG{u,Piu{u)) < du < dG{u,pi^+2{u)) < dG{u,pi,^{u)) < d„(l + e). 

Now, we have du < dG{u,pi^+2{u)) < {iu + '^)dG{u., v). If lines 11 to 13 are executed then 
dGiu,pj{u)) < dGiu,pi'^{u)) < du{l + e) < {I + e){iu + 2)dGiu,v). Thus, ifpj(ti) G a 
value of at most 

2dG{u,Pj{u))+dG{u,v) < {2{l + e){iu + 2) + l)dG{u,v) < {2{1 + e){k - 1) + l)dG{u,v) 

< {2{l + e)k-l)dG{u,v) 

is returned in line 12. If Pj{u) ^ and pj^i{v) G Bu, Lemma [T] gives j + 1 < k — 1 and 
dGiv,pj+i{v)) < dG{v,pj{u)) < dG{u,v) + dG{u,Pj{u)) < ((1 + e)(iu + 2) + l)dG{u,v). 

Furthermore, since pj+i{v) £ Bu and j + 1 < we have 
dG{u,pj+i{v)) < dGiu,pj+2iu)) < dG{u,pi>^{u)) <d„(l + e) < {I + e){iu + 2)dGiu,v). 

Hence, a value of less than 

(2(1 + e)iiu + 2) + l)dG{u, v) < (2(1 + e)ik-l) + l)dGiu, v) < (2(1 + e)k - l)dG(n, v) 

is returned in line 13. The same argument as for line 12 with i'^^ instead of j shows that 
the desired distance estimate is output in line 14. If we reach line 15, Pi'^{u) ^ By and (as 
already observed) pii^^i{v) £ Bu- Then iu + 2 < i'u < k — 2 and 

dG(v,Pi'^+i{v)) < dG{v,pii^{u)) < dG{u,v) +dG{u,pii^{u)) < dciu.v) + du{l + e) 

< ((1 + e){iu + 2) + l)dG{u, v) < ((1 + e){k-2) + l)dG{u, v) 

< {il + e)k-l)dG{u,v) 

so a value of at most 

2dG{v,Pi'^+i{v)) + dG{u,v) < {2{{l + e)k-l) + l)dG{u,v) = {2{1 + e)k - l)dG(,u,v) 
is returned in line 15. 

Now assume that the while-loop ended with refine_further(ti, v, i^j) returning true. 
Then i = [log(2a)/log(l + e)] iterations have been executed so the final value of du 
is at most akdG{u,v)/{l + e)* < ^dG{u,v). If the algorithm returns a value in line 
12 then this value is at most 2dG{u,pj{u)) + dG{u,v) < 2du{l + e) + dG{u,v) < ((1 + 
e)k + l)dG{u,v). If pj{u) ^ By and Pj+iiv) G Bu then dG{v,pj+i{v)) < dG{v,pj{u)) < 
dG{u,v) + dG{u,pj{u)) < dG{u,v) + ^^(1 + e) so a value of at most 2dG{v,pj+i{v)) + 
dG{u,v) < 2du{l + e) + 3dG{u,v) < ((1 + e)A; + 3)dG{u,v) is returned in line 13. Since 
> 4, this gives the desired estimate. A similar argument gives the same estimate for 
lines 14 and 15. This completes the proof. □ 
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Algorithm combe (S*) 
1. let 

•Smax be the largest element of S 

2- < {Smax}) S < S \ {Smax} 

3. while 5' / 

4. let si be the largest element of S' and let S2 be the smallest element of 5,; 

5. s min{si, S2/(l + e)} 

6. S,^S,U {s} 

7. remove all the elements from S' that have value at least s 

8. return 

Figure 3: Algorithm that outputs the e-comb of a non-empty set S of real values. 
4.2 Combining with the Mendel-Naor oracle 

Our oracle will query that of Mendel and Naor for a distance estimate and then give it 
as input to an efficient implementation of refine_dist. We will keep a sorted list of values 
such that for any distance query, the list contains the 0(1/ log (1 + e)) d^-values found 
in refine_dist as consecutive entries. We linearly traverse the list to identify these entries 
some of which point to z^-indices needed by refine_dist. These pointers together with some 
additional preprocessing allow us to execute each iteration of the while-loop in 0(1) time. 

To ensure that list elements are spaced by a factor of at least 1 + e, we need a new 
definition. Let S be a non-empty set of real numbers and let e > be given. Define the 
e-comb of S to be the set of real numbers obtained by the iterative algorithm combe(S') 
in Figure [3l Lemmas [6] and [8] below show that the e-comb of a certain superset of the 
set of all distances that can be output by the Mendel-Naor oracle has the above property 
while not containing too many elements. 

Lemma 6. Let be the e-comb of a set S. Then 

1. for any s £ S, there is a unique s' € such that s < s' < (1 + e),s, 

2. any two elements of differ by a factor of at least I + e, and 

3. \S,\ < \S\. 

Proof. To show the first part, define s^*^ to be the element s found in the ith iteration of the 
while-loop. Define s\ and similarly. Now, let s S 5 be given. Since Smax £ S^, there 
is an element of S'e which is at least s. Let Smin be the smallest such element and suppose 
for the sake of contradiction that Smin ^ (1 + Let i be the iteration in which Smin is 
added to S^- Since s < s^^\ s = s^-'-* for some j > i + 1 so s < s^^^\ After line 7 has been 
executed, every element of S' is strictly smaller than s*-*^ = Smin- Thus, s < s^*"*"^^ < Smin- 
Since also s^'+^^ = s^^ = Smin > (l + e)s, it follows that s < < Smin- But s(*+^) E S„ 

contradicting the choice of Smin- 

We have shown that s < Smin ^ (1 + e)s. To show uniqueness, let s' be the first 
element added to for which s < s' < (1 + e)s. Assume for the sake of contradiction 
that s' 7^ Smin- Then s^in was added in a later iteration than s' so s < Smm = s^*^ < 
S2 V(l + ^) ^ '^'/(l + e) < s, a contradiction. Thus, s' = Smm, showing uniqueness. 

The second part of the lemma holds since in line 5, S2 is the smallest element of 
and the next element s to be added to this set satisfies s < S2/(l + e)- 
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The third part of the lemma follows since in line 2, \S^\ = 1 and 5" = 1 5*1 — 1 and since 
at least one element (namely si) is removed from S' in line 7 after an element has been 
added to S^- □ 

For any vertices u and v, denote by d]\ji\f{u,v) the tiv-distance estimate output by the 
Mendel-Naor oracle and let auNk be the stretch achieved by the oracle, i.e., omn = 128. 
Let Vmn = {dMNiu,v)\u,v G V} be the set of all distances that the oracle can output. 

Lemma 7. \Vmn\ = 0{n^+^/''). 

Proof. The Mendel-Naor oracle stores trees representing certain ultrametrics. Each tree 
node is labelled with a distance and each approximate distance output by the Mendel-Naor 
oracle is one such label. Hence, since the oracle has size 0(ni+i/'=), so has Vmn- □ 

Lemmas. For each d £ Vmn , letV^ = {d/{l+€y\0 < i < [log(2aAfAr(l+e))/log(l+e)]} 
and let V^ be the e-comb of UdeVMN'^d,- Then for each d G Vmn, there exists a unique 
d' G V^ such that d < d' < d{l + e) and d'/{l + ef £ V^ for < i < [log(2aA/Ar(l + 
e))/log(l + e)]. Furthermore, \V^\ = 0{n^+^/''/\og{l + e)). 

Proof. The existence and uniqueness of d' follows from Vmn C UdeOA/jv from 
part 1 of Lemma m Define di = d/{l + e)* and d'^ = d'/{l + e)*. We use induction on 
i > to show that d'^ G V^. The base case i = has been shown since dg = d' so 
assume < i < [log(2aA/Ar(l + e))/log(l + e)] and that d[_-^ G Pe. Consider the iteration 
of combe(Urfgx'MJv^'i) following that in which dj-i was added to 5e. Here, si > di-i 
since G S' and so S2 = d'^_i = d'^{l + e) < c^i-i(l + e) < si{l + e), giving s = 
min{si,S2/(l + e)} = ■52/(1 + e) = d[ which is added to in line 6. Hence, d'^ G V^, 
completing the induction step. 

For the last part of the lemma, since log(2aA/Ar(l + e))/log(l + e) = 0(l/log(l -|- e)). 
Lemma [7] and part 3 of Lemma [6] give 

\V,\ < V \Vd\ = 0(|PA/,v|/log(l + e)) = 0(ni+Vfc/iog(i + 



deVMN 



□ 



As mentioned earlier, certain elements of the e-comb in Lemma [8] contain pointers to 
i^t-indices. These pointers are defined by the following type of map. For a set S of real 
values with smallest element Smin, define ts : [smin, oo) — S" by ts{x) = max{s G S'|s < x}. 

Lemma 9. Let S be a set of real values with smallest element Smin o-i^-d let x,y £ [smim oo). 
If si < S2 are elements in S then ts{x) = Ts{y) = si iff x,y £ [si,S2)- 



4.3 Preprocessing 

We are now ready to give an efficient implementation of algorithm refine_dist. We construct 
the Mendel-Naor oracle and obtain the set Vmn- For each vertex u, we construct bunch 
Bu and the set Pu of values dciu^v) for each v £ B^. We represent P„ as a list sorted 
by increasing value. Furthermore, we find a set Su of real values as follows. For each 
index i £ {0, . . . , jP^I — 2} of Pu, subdivide interval + 1]] into four even-length 

sub intervals. We denote by Zu the set of these subintervals over all i and form the set 
Su of all their endpoints. We obtain the e-comb V^ as defined in Lemma [8] and represent 
it as a sorted list. Then we form a set Vf:{u) of those d £ V^ for which d is either 
the smallest or the largest element that ts^ maps to ts^ (d) ; see Figure [H With each 
d £ Vf:{u), we associate the largest even index iuid) such that dG(u,Pj„(d)(it)) < '^Suid)- 
For all d £ V^\ V^{u), we leave iuid) undefined. 
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Figure 4: Sets U^gxiA/jv^d) ^u, and Pu (ordered by increasing value from left to right) 
as well as the map ts^ restricted to the subset V^{u) (white points) of V^. Elements of 
UdeVMN'^d represented by long line segments are those belonging to Vmn- For clarity, 
elements of each set from Lemma [8] are evenly spaced in the figure. 



4.4 Query 

To answer an approximate uv-distance query, we first obtain the Mendel-Naor estimate 
dMN{u,v) and identify the smallest element duv of which is at least dMN{u,v)- This 
element is the input to refine-dista^e where a = (1 + e)aMN- By Lemma El duv is an 
a/c-approximate distance so the output will be a (2(1 + e) A; — l)-approximate distance. 

It follows from Lemma [8] and part 2 of Lemma [6] that all values of du in refine_dist are 
consecutive and start from duv in T^t- Linearly traversing the list from duv thus corresponds 
to updating du in the while-loop. 

We also need to maintain even index iu- Assume for now that for the initial du, index 
iu{du) is defined. Then the initial iu is iu{du)- As du is updated in the while-loop, at some 
point it may happen that iu{du) is undefined. Let d'u be the last value encountered in the 
linear traversal such that iu{d'^) is defined. Then d'^ is the largest element in that ts^ 
maps to Ts^{d'u) and du is larger than the smallest such element. Hence, Ts^{du) = Ts^{d'^) 
and it follows from Lemma that iu need not be updated from the value it had when d'^ 
was encountered. Thus, maintaining iu is easy, assuming its initial value can be identified. 

What if iu{du) is undefined for the initial d^? Then we move down the list until we 
find an index iuid'u) that is defined. By Lemma El this index is the initial value of iu and 
we are done. The problem with this approach is that we may need to traverse a large part 
of the list before the index can be found. We can only afford to traverse 0(l/log(l + e)) 
entries of P^. The following lemma shows that if the search has not identified an index 
iu{du) after a small number of steps then our oracle can output twice the distance value 
in the final entry considered. 

Lemma 10. For vertices u and v, let j be the index ofT>e such that T>^[j] = duv Assume 
that j^in = j — [log(2aA.fAr)/log(l + e)] is an index ofV^^ such that iui'DeH']) and iv{'D^[j']) 
are undefined for all jmin < / < j. Then dG{u,v) < 2Pe[jmin] < (1 + e)kdG{u,v). 

Proof. We have dG{u,v) < < (1 + e)aMNkdG{u,v). For each index j' > of V^, 

T>e[j' ~ 1] = ^e[i']/(l + ^) by Lemma [8] and part 2 of Lemma [H Thus, 

-nr- 1 ^^t-?'] / {l+e)aMNk {1 + e)k 

^^[^-J = (l + ,).-...in ^ (i + ,)iog(2».,^)/iog(i+.) ^^(^'^) = -^—dG{u,v), 

showing the second inequality of the lemma. 

To show the first inequality, let / G X„ be the interval containing 'Pe[j]. Then it follows 
from Lemma [H that T>^[j'] G / for every f satisfying the condition in the lemma. Recalling 
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our assumption e < ^ < 1 — I/qa/at, we get (1 + e)^ > 2aMN > 2/(1 — e) so 

VA3\ - Peb-n.in] = (l - J^^j^) > (l " ^) ^^^1 > k^oM 

and since 2?^ [j] , 2?^ [jmin] ^1,1 must have length > ^dG{u,v). Let ju be the index of Pu 
such that interval lu = [Pu[ju], Pu[ju + 1]] contains /. Since / is one of four consecutive 
subintervals of lu of even length, 1^ has length > 2dG{u,v). Also, Pttb^] < I'e[jmin]- 

Similarly, there is an index j„ of Py such that = [-P„b?;], -Pu[j„ + 1]] has length 

> 2dGiu,v) and Pv[jv] < T^e[jmm]- 

Let j be the final index of distfc(u, f,0) (corresponding to a ra-query to the Thorup- 
Zwick oracle). Assume it is even (the odd case is handled in a similar manner). Then 
dGiu,pji^2{u)) — dG{u,pji{u)) < 2dG{u,v) for all even / < j — 2 (using an observation 
similar to that in the proof of Lemma H]). By the above, -Pub'u] > dG{u,pj{u)). We also 
have dG{v,Pj'^2{v)) — dG{v,pji{v)) < 2dG{u,v) for all oddj' < j — 3 so again by the above, 
Pv[jv] > dG{v,Pj-i{v)). Finally, since Pj-i{v) ^ Bu, 

dGiv,pj{u)) < dG{u,v)+dG{u,pj{u)) < dG{u,v)+dG{u,pj-i{v)) < 2dG{u,v)+dG{v,Pj-i{v)). 

Thus, dG{v,Pj{u))—dG{v,Pj-i{v)) < 2dG{u,v) and since (u) G B^, wehawe dG{v ,pj{u)) E 
Py. Also, dG{v,Pj^i{v)) G so since P^, [j^,] > dG{v,pj^i{v)),we get Py[iy\ > dG{v,Pj{u)). 
We can now conclude the proof with the first inequality of the lemma: 

dG{u,v) < dG{u,Pj{u)) + dG{v,Pj{u)) < Pu[ju] + Pv\jv] < 2Pebmin]- 

□ 

4.5 Running time and space 

We now bound the time and space of our oracle. 

Preprocessing: Constructing the Mendel-Naor oracle takes 0(n^"^^/^ log n) time and 
requires 0(77,^"*"^/^') space. Traversing the nodes of the trees kept by the oracle identifies all 
distances in time proportional to their number which by Lemma [7] is 0{n^~^^^''). Sorting 
them to get the list representation of T>mn then takes 0{'n}'^^/^ logn) time. 

Forming a sorted list of the values from ^deVuN'^d in Lemma [H] can be done in 
0((|PA/7v|/log(l + e)) logn) = 0(^71^+-^/*^ log n) time and requires 0{^n^^^/'') space. 
Clearly, when the input to combe is given as a sorted list, the algorithm can be im- 
plemented to run in time linear in the length of the list. Thus, computing a sorted list of 
the values of can be done in 0(^77^"'"^/'^ log 77) time. 

By the analysis of Thorup and Zwick, forming bunches B^ takes 0{kmn^^^) time. Since 
these bunches have total size 0{kn^+y>'), sorted lists P„ can be found in 0{kn^~^^^^ log 77) 
time. Sets Su can be found within the same time bound. 

Forming D^{u)-sets can be done by two linear traversals of the sorted list L of values 
from U Uuey '^^^ ^^^^ traversal visits elements in decreasing order. Whenever 
we encounter a d from a set Su, let d' be the previous visited element of 5^ {d' = 00 if 
no such element exists) and let d" be the latest visited element of P^. If d < d" < d' , 
d" is the smallest element of that ts^ maps to Ts^{d") = d so we add d" to V^{u). 
Otherwise we do nothing as rg^ maps no element of to d. The second traversal visits 
elements in increasing order. When we encounter a d G Su, let d! be the predecessor of 
d in Su {d' = —00 is no such element exists) and let d" be the latest visited element of 
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Pe- Then, assuming d' < d" < d, d" is the largest element that ts^ maps to Ts^{d") = d' 
and so we add d" to V^{u). Together, these two traversals form all Pe(«)-sets in time 

0{\v,\ + T.uev\Su\). 

Since each element of each set Su is associated with at most two elements of T>e{u), 
we get a space bound of 0{kn^~^^/^) for sets In the two traversals, we can easily 

identify iu{d), d £ V^{u), without an asymptotic increase in time. We represent each of 
these index maps as hash functions in the same way as bunches are represented in the 
Thorup-Zwick oracle. These hash functions do not increase space. 

Query: To answer a uw-query, we need an efficient implementation of algorithm refine_dist 
The while-loop consists of 0(l/e) iterations. Sub-routine refine_further can be imple- 
mented to run in constant time assuming we have precomputed, for each u and each even 
index > 2, the even index j in 0, . . . , — 2 that maximizes 5j. This preprocessing can 
easily be done in 0{kn) time. It then follows that refine_dist runs in 0(1/ e) time and we 
can conclude with our second main result. 

Theorem 2. For any integer k > 1 and any < e < 1, a ((2 + e)k)- approximate distance 
oracle of G of size 0{kn^+y^) and query time 0(l/e) can be constructed in 0{n'^^^/^ logn) 
time. For k = 0(logn/ log logn) and constant e, space can be improved to 0{n^^^^^). 

Proof. We may assume that k > 4 since otherwise we can apply the Thorup-Zwick oracle 
or our O(logfc) query time oracle. Apply Lemma [5] and Lemma [TOl with e' = < | 
instead of e. Then we get stretch (2 -|- e)k, size 0{kn^~^^^''), and query time 0(l/e). This 
shows the first part of the theorem. 

To show the second part, apply the first part with ei = instead of e and k' = A:(l-|-e2) 
instead of k, where €2 = e/(4-|-e) (we assume here for simplicity that k(l+e2) is an integer). 
Then {2 + ei)k' = {2+e)k so we get the desired stretch. Size is 0{k'n^+'^/''') = 0(A:n^+^/^'). 
Letting £3 = £2/(1 -|- £2), we have 1/A;' = (1 — es)/^ so we get size 0(n^"''^/'^) if kn~'^^^^ < 1, 
i.e., if k\ogk < e-s^ogn. The latter holds when k = O (logn/ log logn). □ 

4.6 Faster preprocessing 

In this subsection, we show how to improve the 0(n2+Vfc log n) preprocessing bound 
in Theorem [2j First, we can replace the Mendel-Naor oracle with that of Mendel and 
Schwob [6]. This follows since the latter also uses ultrametric representations of approxi- 
mate shortest path distances so the proof of Lemma [7] still holds. This modification alone 
gives a preprocessing bound of 0{mn}/^ log^ n). 

Next, observe that our result holds for any 0(/c)-approximate distance dMN{u,v) out- 
put and not just for umn = 128. More precisely, let C > 1 be an integer. If dMNiu, v) has 
stretch Ck then it follows from our analysis that this estimate can be refined to {2 + e)k 
in 0(log C/e) iterations and we get preprocessing time 0{mn^/'^^^^ log^ n) and query time 
0(logC/e). In addition to this, we need to construct bunches and form sorted lists Pu- As 
shown earlier, this can be done in 0{kmv}^'^ + kn^~^^/^ logn) time. Combining this with 
the above gives the following improvement in preprocessing over that in Theorem [21 

Theorem 3. For any integers k > 3 and C > 2 and any < e < 1, a ((2 -|- e)k)- 
approximate distance oracle of G of size 0{kv}^^^^) and query time O (log C/e) can be con- 
structed in 0{kmn}/^ + kri^'^^/^ logn -|- mn^/^*^*^) log^ n) time. For k = O (log n/ log log n) 
and constant e, space can be improved to 0(n^^^/*^). 
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5 Concluding Remarks 



We gave a size 0{kn^^^^^) oracle with 0(log A;) query time for stretch {2k — l)-distances, 
improving the 0{k) query time of Thorup and Zwick. Furthermore, for any positive e < 1, 
we gave an oracle with stretch (2 + e)k which answers distance queries in 0(1/ e) time. 
This improves the result of Mendel and Naor which answers stretch 128A:-distances in 0(1) 
time. 

For the first oracle, can we go beyond the 0(logA;) query bound? And can space be 
improved to 0{n^~^^^'')? For the second oracle, can stretch be improved to 2A; — 1 while 
keeping 0(1) query time? To our knowledge, the oracle of Mendel and Naor cannot be 
used to produce approximate shortest paths, only distances. Our second oracle then has 
the same drawback (due to Lemma [TOl) . What can be done to deal with this? 
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